Picture for Tianhang Zheng

Tianhang Zheng

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Add code
May 28, 2026
Viaarxiv icon

RouteScan: A Non-Intrusive Approach to Auditing MoE LLMs Safety via Expert Routing Telemetry

Add code
May 24, 2026
Viaarxiv icon

Reasoning as an Attack Surface: Adaptive Evolutionary CoT Jailbreaks for LLMs

Add code
May 23, 2026
Viaarxiv icon

Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training

Add code
Apr 27, 2026
Viaarxiv icon

Mitigating Error Amplification in Fast Adversarial Training

Add code
Apr 27, 2026
Viaarxiv icon

Accelerating Suffix Jailbreak attacks with Prefix-Shared KV-cache

Add code
Mar 12, 2026
Viaarxiv icon

MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety

Add code
Feb 02, 2026
Viaarxiv icon

Attack-Resistant Watermarking for AIGC Image Forensics via Diffusion-based Semantic Deflection

Add code
Jan 10, 2026
Viaarxiv icon

DualBreach: Efficient Dual-Jailbreaking via Target-Driven Initialization and Multi-Target Optimization

Add code
Apr 21, 2025
Viaarxiv icon

Nearly Optimal Differentially Private ReLU Regression

Add code
Mar 08, 2025
Viaarxiv icon